Goto

Collaborating Authors

 mem 1


Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking

Heck, Linus, Macák, Filip, Češka, Milan, Junges, Sebastian

arXiv.org Artificial Intelligence

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfi-ability solvers to handle the combinatorial nature of the problem and probabilistic model checking algorithms to handle the analysis of MDPs. Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis and the competitiveness with state-of-the-art methods for various fragments of the problem.


Leading Multiple Ad Hoc Teammates in Joint Action Settings

Agmon, Noa (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)

AAAI Conferences

The growing use of autonomous agents in practice may require agents to cooperate as a team in situations where they have limited prior knowledge about one another, cannot communicate directly, or do not share the same world models. These situations raise the need to design ad hoc team members, i.e., agents that will be able to cooperate without coordination in order to reach an optimal team behavior. This paper considers problem of leading N-agent teams by a single agent toward their optimal joint utility, where the agents compute their next actions based only on their most recent observations of their teammates' actions. We show that compared to previous results in two-agent teams, in larger teams the agent might not be able to lead the team to the action with maximal joint utility. In these cases, the agent's optimal strategy leads the team to the best possible reachable cycle of joint actions. We describe a graphical model of the problem and a polynomial time algorithm for solving it. We then consider the problem of leading teams where the agents' base their actions on a longer history of past observations, showing that the an upper bound computation time exponential in the memory size is very likely to be tight.